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Abstract Territorial subdivisions and geographic borders are essential for under- 
standing phenomena in sociology, political science, history, and economics. They 
influence the interregional flow of information and cross-border trade and affect the 
diffusion of innovation and technology. However, most existing administrative bor- 
ders were determined by a variety of historic and political circumstances along with 
some degree of arbitrariness. Societies have changed drastically, and it is doubt- 
ful that currently existing borders reflect the most logical divisions. Fortunately, at 
this point in history we are in a position to actually measure some aspects of the 
geographic structure of society through human mobility. Large-scale transportation 
systems such as trains and airlines provide data about the number of people traveling 
between geographic locations, and many promising human mobility proxies are be- 
ing discovered, such as cell phones, bank notes, and various online social networks. 
In this chapter we apply two optimization techniques to a human mobility proxy 
(bank note circulation) to investigate the effective geographic borders that emerge 
from a direct analysis of human mobility. 
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1 Introduction 

The geographic compartmentalization of maps into coherent territorial units is not 
only essential for the management and distribution of administrative responsibil- 
ities and the allocation of public resources. Territorial subdivisions also serve as 
an important frame of reference for understanding a variety of phenomena related 
to human activity. Existing borders frequently correlate with cultural and linguis- 
tic boundaries or topographical features ll34l [TTll . they represent essential factors 
in trade and technology transfer lfT9l |29| . and they indirectly shape the evolution 
of human-mediated dynamic processes such as the spread of emergent infectious 
diseases ||2n[T2l[TT][j21. 

The majority of existing administrative and political borders, for example in the 
United States and Europe, evolved over centuries and typically stabilized many 
decades ago, during a time when human interactions and mobility were predomi- 
nantly local and the conceptual separation of spatially extended human populations 
into a hierarchy of geographically coherent subdivisions was meaningful and plau- 
sible. 

However, modern human communication and mobility has undergone massive 
structural changes in the past few decades lf34l [30l . Efficient communication net- 
works, large-scale and widespread social networks, and more affordable long- 
distance travel generated highly complex connectivity patterns among individuals 
in large-scale human populations |i3,i9J. Although geographic proximity still dom- 
inates human activities, increasing interactions over long distances 17] |25] |38]| and 
across cultural and political borders amplify the small- world effect BTl [311 and 
decrease the relative importance of local interactions. 

Human mobility networks epitomize the complexity of multi-scale connectivity 
in human populations (see Figure [T]i. More than 17 million passengers travel each 
week across long distances on the United States air transportation network alone. 
However, including all means of transportation, 80% of all traffic occurs across 
distances less than 50 km QEl. The coexistence of dominant short-range and sig- 
nificant long-range interactions handicaps efforts to define and assess the location 
and structure of effective borders that are implicitly encoded in human activities 
across distance. The paradigm of spatially coherent communities may no longer be 
plausible, and it is unclear what structures emerge from the interplay of interactions 
and activities across spatial scales |l2l 00] IH |25l . This difficulty is schematically 
illustrated in Figure |2] Depending on the ratio of local versus long-range traffic, 
one of two structurally different divisions of subpopulations is plausible. If short- 
range traffic outweighs long-range traffic, local, spatially coherent subdivisions are 
meaningful. Conversely, if long-range traffic dominates, subdividing into a single, 
spatially de-coherent urban community and disconnected suburban modules is ap- 
propriate and effective geographic borders are difficult to define in this case. 

Although previous studies identified community structures in long-range mobil- 
ity networks based on topological connectivity [28, ,36|, this example illustrates that 
the traffic intensity resulting from the interplay of mobility on all spatial scales must 
be taken into account. Obtaining comprehensive, complete, and precise datasets on 
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Fig. 1 The Where's George? network. Multi-scale human mobility is characterized by dominant 
short-range and significant long-range connectivity patterns. The illustrated network represents a 
proxy for human mobility, the flux of bank notes between 3,109 counties in the lower 48 United 
States. Each link is represented by a line, the color scale encodes the strength of a connection from 
small (dark red) to large (bright yellow) values of Wjj spanning four orders of magnitude. 




Fig. 2 A simplified illustration of generic traffic patterns between and within metropolitan mobility 
hubs (A and B), with two types of connections wt and wo, local traffic connecting individual hubs 
to smaller nodes in their local environment (blue) and long-distance links connecting the hubs 
(red). Depending on the ratio of local and long-range flux magnitude, two qualitatively different 
modularizations are plausible. If wl 2> wo, two spatially compact communities are meaningful 
(left), whereas if W[, <S wo, the metropolitan centers belong to one geographically delocalized 
module (orange), effectively detached from their local environment, yielding three communities 
altogether (right). 

human mobility covering many spatial scales is a difficult task, and recent studies 
have followed a promising alternative strategy based on the analysis of proxies that 
permit indirect measurement of human mobility patterns ITTI l25l [38l [30l [TSl [T6l . 

We focus on one human mobility proxy, a dataset collected at the website wwwT] 
[wheresgeorge.com. This website hosts a bill-tracking game called Where's 
George? in which participants can tag an individual US banknote of any denomi- 
nation by logging in to the website and entering the bill's serial number along with 
their location. Subsequent participants who receive the bill may do the same, thereby 
recording a part of the spatial trajectory the bill follows during its lifetime. We use 
this information to construct a network whose nodes represent counties in the con- 
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tinental United States and whose edges encode the number of bills exchanged be- 
tween pairs of counties; details of this and a discussion of some statistics of the data 
are given in Section [3] 

Both of our analyses rest on the idea of finding community partitions of the 
network, that is, dividing all of the nodes into a set of mutually disjoint groups 
or communities. A community of nodes can be defined in many different ways, but 
all definitions try to capture some aspect of the intuitive idea of a community: a set 
of nodes that belong together, or are more similar to one another than they are to the 
rest of the population. 

Our first analysis in Section|2]uses a modularity maximization technique to iden- 
tify community partitions. Modularity is a method of scoring any given commu- 
nity partition in a network. A partition with a high modularity score has many more 
intra-group links, and fewer inter-group links, than expected by random chance. Our 
optimization algorithm searches for high-modularity partitions through a stochastic, 
simulated annealing process. 

We go on to determine community partitions in Section|6]by searching for nodes 
with similar topological features, namely their shortest-path tree. Each node is the 
root of a shortest-path tree that comprises a minimal set of the strongest links con- 
necting that node to the rest of the network. By looking for topological similarities 
between shortest-path trees, we identify groups of nodes that have similar patterns 
of connectivity. 

With both methods, once a community partition is identified, a corresponding 
geographic border structure is produced simply by drawing borders between coun- 
ties that do not belong to the same community, and in Section |5] we discuss how a 
superposition of border structures alleviates some of the long-standing weaknesses 
of modularity maximization. The fact that communities tend to be spatially compact 
is one of the most surprising findings of this research, and we conclude in Section]?] 
by developing a method for comparing border structures and examining the degree 
to which effective mobility borders line up with various existing borders, such as 
state boundary lines, census areas, and economic areas. 



2 Network modularity 

This section introduces the modularity measure and describes the simulated anneal- 
ing algorithm we use for finding maximal-modularity community partitions. 

We assume here that W is a square, symmetric matrix that represents a symmet- 
ric, weighted network; the elements wij are nonnegative and measure the strength of 
the connection between nodes ; and j. Based on the idea that two nodes ; and /' are 
effectively proximal if w,y is large, we search for a community partition of the nodes 
that has a high value of modularityll fT3l l24l [35 1 . This standard network-theoretic 
measure of community structure prefers partitions such that the intra-connectivity 
of the modules in the partition is high and inter-connectivity between them is low as 
compared to a random null model. Given a partition P of the nodes into k modules 
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M„, the modularity Q{P) is defined as 

Q = Y.^Fn (1) 

n 

in which AF„ = F„ —F^ is the difference between F„, the fraction of total mobility 
within the module M„, and the expected fraction Fj^ of a random network with an 
identical weight distribution p{w). Q cannot exceed unity; high values indicate that 
a partition successfully groups nodes into modules, whereas random partitions yield 
2 « 0. Maximizing Q in large networks is an NP-hard problem |6|, but a variety of 
algorithms have been developed to systematically explore and sample the space of 
possible divisions in order to identify high-modularity partitions lfT3ll22ll . 



2.1 Finding optimal partitions 

As discussed in more detail in Section|4j our method relies on finding several differ- 
ent high-modularity partitions, which restricts the range of applicable algorithms. 
For example, the deterministic divisive algorithms described by Newman and Gk- 
van ll35l cannot find several different local maxima of the modularity function. In 
contrast, Monte Carlo algorithms return different partitions with probabilities that 
monotonically increase with the corresponding modularity values, one of which is 
the simulated annealing algorithm described by Guimera and Amaral [27 1. Addi- 
tionally, this algorithm has been found to perform the best in terms of correctly 
identifying modules in networks with artificial community structure in a survey by 
Danon et al. lfT3l . which lead us to choose this algorithm for our work. 

The partition vector P is initialized such that each of the nodes is in its own 
module. Pi = /. Alternatively, one could randomly assign each node to one of a few 
modules to form the initial partition. We found, however, that in this case the algo- 
rithm will split these few large modules into a large number of very small modules 
before slowly merging them into the final result. Since splits of large modules, in- 
volving a recursive simulated annealing run, are computationally very expensive, 
we avoid them by starting with a partition of single-node modules. 

A small modification of the partition is then made (see below) to obtain a new 
partition P' and its effect on the modularity value, AQ = Q{P') - Q{P).lf AQ > 0, 
the new partition is better than the old one and we replace P = P' . If AQ < 0, the 
partition is only accepted with probability pt{AQ) ~ exp{AQ/T), where T is a 
"temperature" that controls the typical penalty on Q we are willing to accept with 
the new partition P'. 

This procedure is repeated a number of times, initially with a high T — Tq accept- 
ing modifications with large negative impact on modularity and therefore allowing 
to sample multiple local maxima. After 0{N^) modifications, the temperature is 
lowered by a cooling factor c. When T is small enough, worse partitions are not 
accepted anymore and the partition P has "annealed" into a local maxima of the 
modularity landscape Q{-). 
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During each temperature step, we intersperse fN^ local with fN global modi- 
fications of the partitions, where / is a tuning parameter. A local modification is a 
switch of one node to another, randomly selected, module, while a global modifi- 
cation can be a merge of two or a split of one randomly selected module. Finding 
a suitable split of a module that is not immediately rejected is done by recursively 
running a simplified version of the simulated annealing algorithm on it: the module 
in question is extracted and treated as an independent network, initially randomly 
partitioned into two modules. Only local modifications are allowed while annealing 
this bipartition into a local modularity maximum. Afterwards, the split module is 
replaced into the full network and evaluated against the modularity value of the full 
partition. 

We observed that the global structure of the partition is found quickly by the al- 
gorithm and mostly only local modifications are accepted at low temperatures. Since 
the split operations are computationally intensive, we therefore track the number of 
rejected split modifications in each temperature step and reduce the probability of 
future trials if that number is high. 

To generate the large ensemble of partitions discussed in Section |5] we used 
Tq = 2.5 • 10^"* as initial temperature, c — 0.75 as the cooling factor, and / — 0.05. 
We abort the procedure and accept the partition as "optimal" if no better partition is 
found in three consecutive temperature steps. 

The run time of this stochastic algorithm depends in a complex way on both the 
size and the structure of the the input, and therefore the time complexity does not 
scale with a simple function of the input size. However, we found the algorithm to 
perform very well and in acceptable runtime (60-90 minutes on a 2.8 GHz proces- 
sor for most runs) with the configuration given above, although these parameters 
are less conservative than those proposed by Guimera et al lETI . The large ensem- 



ble of resulting partitions (Figure 15 1 has a tight distribution of modularity values. 



indicating the algorithm tends to converge onto a stable maximum. 



3 A proxy for multiscale human mobility networks 

Here we construct a proxy network for human mobility from the geographic cir- 
culation of banknotes in the United States. Movement data was collected using the 
online bill-tracking game at www . wheresgeorge . com. Individuals participating 
in this game can mark individual bills and return them to circulation; other individ- 
uals who randomly receive bills can report this find online along with their current 
location (zip code). Our analysis is based on the intuitive notion that the coupling 
strength between two locations / and j increases with wf-, the number of individuals 
that travel between a pair of locations per unit time, and furthermore that the flux 
of individuals in turn is proportional to the flux of bank notes, denoted by Wij. Evi- 
dence for the validity of this assumption has been obtained previously Il7ll8l l25]| and 
we provide further evidence below. 
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As of January 15th, 2010 a total of 187, 925, 059 individual bills are being tracked 
at the website |www . wheresgeorge . com| Approximately 1 1.24% of those have 
had "hits", i.e. they were reported a second time at the site after initial entry. The 
current analysis is based on a set of A'o — 11,950,239 bills that were reported at 
least a second time. For each bill n we have a sequence of pairs of data 

B„ = {Z„,i,r„,,} / = 0,...,L„ n=l,...,A^o 

of zip codes Z„ and times r„ , at which the bill was reported. Each B„ reflects a geo- 
graphic trajectory of a bill with L„ individual legs. In total, we have 14,612,391 sin- 
gle legs in our database. Note that the majority (81.78%) of trajectories are single- 
legged reflecting a reporting probability of ss 20% during the lifetime of a bill. 

The set of B,, represents the core dataset of our analysis. For each bill we have 
additional information: 

1. Denomination: $1, $2, $5, $10, $20, $50, or $100. The fraction of each denomi- 
nation is depicted in Table [T] 

2. The Federal Reserve Bank code, A through L, corresponding to one of 12 of the 
United States Federal Reserve Banks that issued the bill. The fraction of bills as 
a function of FRB origin is provided in Table |2] 

Table 1 Denominations in the WG dataset 



Denomination 


$1 


$2 


$5 


$10 


$20 


$50 


$100 


Number of bills ' 


9,931,261 


36,639 1,069,427 


401,101 


461,076 


24,209 


26,526 


Fraction [%] 


83.11 


0.31 


8.95 


3.36 


3.86 


0.20 


0.22 



Table 2 Absolute number and relative fraction of bills based on Federal Reserve Bank 



FRB Code Location Count Fraction [%] 



A 


Boston 


799,537 


6.69 


B 


New York City 1,325,942 


11.10 


C 


Philadelphia 


822,340 


6.88 


D 


Cleveland 


661,278 


5.53 


E 


Richmond 


948,516 


7.94 


F 


Atlanta 


1,565,732 


13.10 


G 


Chicago 


1,207,448 


10.10 


H 


St. Louis 


472,930 


3.96 


I 


Minneapolis 


360,194 


3.01 


J 


Kansas City 


713,393 


5.97 


K 


Dallas 


869,866 


7.28 


L 


San Francisco 


2,203,063 


18.44 



We restrict the analysis to the lower 48 states and the District of Columbia (thus 
excluding Hawaii and Alaska) and consider only legs with origin and destination 
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locations in these states, reducing the original dataset to 11,759,420 bills (98.40% 
of the original data) and 14,376,232 trajectory legs (98.38%). 

The spatial resolution of the dataset is given by 41, 106 zip codes, with mean 
linear extent of 14km. The mean linear extent of the lower 48 states is 2,842km 
defining the bounds of the system. For each zip code Z, we use centroid information 
to associate with each report a longitude/latitude location x — (0, 0), such that each 
trajectory n corresponds to a sequence of geographic locations X, with / = 1, . . . ,L„: 

t„: {X„.o,Zi7;,i,X„j,...,Zir„,L„,X„x„} with n=l,...,A/o (2) 
where X„ o is the initial entry location, and A T„ j — T„,i — T„,i^ i are inter-report times. 



3.1 Geographical distributions 

Based on these trajectories we define the density of initial entries as 

1 " 

m(x) = -£5(x-X„,o) (3) 
and the density of reports as 



PR{^)^^t^j:5{x-X„j), (4) 

where 8 is the Dirac delta function, equal to 1 when its argument is and equal to 
otherwise. 

In order to assess the spatial distribution of reports and initial entries and to quan- 
tify the correlation with the population density we compute the number of reports 
and initial entries for each of the M = 3, 109 counties in the lower 48 states. Defining 
for each county k a characteristic function 

, , fl if xeft 

Xk{^)^<^ . . (5) 
1 otherwise 

where /\ is the polygon defining the county's interior, the number of reports and 
initial entries in county k are given by 

mR (k) ^{Xk)R = J Xk (x) PR (x) dx and m/£ (k) ^{Xk)iE= J Xk (x) Pie (x) dx, 

respectively. Figure |3] compares the distribution of reports mR{k), initial entries 
mjEik) and the population P{k) of the 3,109 counties. As all three quantities are 
positive and vary over many orders of magnitude, the maps depict logio(mR), 
logjQ(m/£) and logio(f ). Qualitatively, reports and initial entries correlate strongly 
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with the population density. Computing the correlation coefficient of the logarith- 
mic quantities yields c{R,P) = 0.933 and c{IE,P) = 0.819. Despite the expected 
increase of mR{k) and mjE{k) with P{k), only the report count increases approx- 
imately linearly with population size, whereas initial entries show a deviation for 
small populations. We believe that this deviation is a consequence of the social dif- 
ference between the subpopulation of "Georgers" that are responsible for initiating 
bills and entering them into the system, "actively" playing the game, and the larger 
group of people that randomly receive a bill and report it, "passively" participating. 
This hypothesis could explain that areas with higher population densities contain a 
larger proportion of internet-savvy communities that are inclined to become Georg- 
ers and initiate bills. In order to exclude a potential bias caused by this effect we 
exclude all the legs in (|6]l that contain an initial entry as the origin, i.e. we only 
investigate the reduced set 

f2,«: {X„j,4r„,2,X„,2,...,4r„,L„,X„,L„} with n=l,...,No (6) 

that excludes the first legs of all t„. Excluding the first leg reduces the number of 
bills to 4,743,330. However, the key results, e.g. the border structures discussed 
in Section |5] are robust against the inclusion of initial entries. Computing mobility 
networks based on either set, f„ or t2.„ does not change the observed pattern signifi- 
cantly. 



3.2 Distance and time: spatially averaged quantities 



Fig. 4 The estimated prob- 
abiUty p{r\t < t) of a bill 
traversing a distance r in 
time t < t where T = 4 days. 
In red a maximum likeli- 
hood fit of the the func- 
tion p{r) ~ r"(l+^) with 
P = 0.7056. 




From t2,„ we extract pairs of spatio-temporal leg distances {ds{X„j,X„j^i),AT„j}, 
where ) denotes the distance on a sphere (shorter segment of the great cir- 
cle that passes through both points). This type of dataset was first investigated in 
2006 based on a much smaller core dataset of bill trajectories Q. In particular, the 
combined probabiUty density (pdf) 
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p{r,t) = (5(r-flf,(X„,-,X„,,_i))5(f-4r„,,)) (7) 

was estimated as well as marginal pdfs p{r) and p{t). The central finding of the 
2006 study was that p{r) ^ r^(^+P) and that the time evolution of the density Q 
can be described by a bi-fractional diffusion equation. Here we reproduce some 
of the properties before we construct the mobility network used in the main text. 
Figure |4] shows the short time pdf of a bill traversing a distance r in a time f < T 
where we chose T = 4 days. Using maximum likelihood we find this function can 
be described by a power-law 

p{r) - with j3 = 0.7056 ± 0.0659. 

This power law describes the dispersal characteristics on a population-averaged 
level. The short-time distance pdf represents a dispersal kernel and for small times 
t approximates the instantaneous rate of traversing a distance r. 

Complementary to this, temporal aspects of the process can be revealed by com- 
puting the pdf for the time f between reporting events given that these occur within 
a small radius r > ro. Figure |5] depicts p{t) for all legs with r < 10 km and a mini- 
mal inter-report time of tmin — 1 day. The inter-report times are described well by a 
power law moderated by an exponential factor 

p(f) -f-^g-'/^o with 7b =248 ±27, a = 0.99 ±0.05. (8) 

The observed power-law decay ^ f^' for times f ^ To is intriguing. These type of 
decays have been observed in a multitude of contexts involving human activity, for 
instance the time between consecutive phone calls [25 1, emails f5l and the number 
of words between two identical words in texts yj. A consequence of this law is 
bursting behavior, i.e. given an event occurred at time fo the probabihty rate that 
an event occurs immediately after the first is higher than expected from ordinary 
Poisson statistics. This behavior is best illustrated by the so-called hazard function 
h{t) that quantifies the instantaneous probability rate of an event happening at t 
given that the last event occurred at f = 0. If we let 

P{T>t) = J^ p{s)ds 

be the cumulative probability that the second event occurs at a time T later than f , 
the hazard function is defined by 

For a Poisson process with rate 7 we have 

h{t) = y P{T>t)^e-^. 
The hazard function can be computed according to 
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50 100 150 200 250 300 350 10° 1o' 10^ 10^ 

interreport time T time t after most recent event 




1 2 3 4 5 6 7 

T [days] 



Fig. 5 Inter-report time statistics. (Left) The function p{t\r < ro) for cq = 10km. The observed 
function can be accounted for by an initial algebraic decay f^' moderated by an exponential func- 
tion for large arguments. The red dashed curved is a fit obtained from maximum-likelihood esti- 
mation. (Right) The hazard function h{t) that represents the instantaneous rate of an event at time 
t provided that an event occurred at f = 0. The dashed lines represent reporting rates of once per 
2 weeks (top), once per month (middle) and once per To = 248 days (bottom). (Bottom) p{t) for 
very short times. A zoom-in resolves daily oscillations modulated by the decay observed on the 
left. These oscillations indicate that users tend to report to the website at the same time of the day 
with the highest probability. 

/.(0^-^log[P(.>r)]^^. 

Figure [5] depicts the function h{t) for inter-report times in the WG data. For small 
times (f < 1 week) the probability rate for a report is of the order of one report per 
two weeks, which is also the expected time between two reports in this time window. 
For larger times (f > 100 days) the constant value of l/Tb is approached, equivalent 
to one report in 3/4 of a year. Possible explanations of the bursting behavior and the 
initial algebraic decay in p(f ) are a strong behavioral heterogeneity of players that 
participate in the game or an effective queueing in the system, i.e. bills may enter 
shops and initially have a comparatively high likelihood of leaving, being "on top 
of the stack". As time passes these bills may "get stuck" and equilibrate to the long 
time scale present in the system. 
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From the trajectories defined by (j6]) and the characteristic functions of the coun- 
ties (|5]l we construct a matrix w^j that counts the number of legs which originate at 
county i and terminate at j, 

n= \ k=2 

where 0( ) is the Heaviside step-function. In order to exclude potential biases in- 
duced by initial entries we ignore the first leg of all trajectories {k — lm the above 
sum). This choice is motivated by the fact that the community of individuals that 
initiate bills might be less representative than those that find bills and report them. 
Indications that this might have an effect are supported by the different scaling be- 
havior of initial entry frequencies with population as compared to report frequencies 
with population. The factor &{T — AT„ i^) excludes legs that have an inter-event time 
larger than time T . The matrix Wij need not to be symmetric, as the flux of bills from 
i j need not equal those that travel j i. However, as Figure[6]indicates the flux 
matrix is statistically symmetric. Plotting Wij against Wji indicates a clear mean lin- 
ear relationship. Since we base our analysis on the flux of money between two given 
counties we symmetrize the network and use Wij in our analysis defined by 

which of course also depends on the time threshold parameter T. Choosing the 
optimal value for T is a trade-off between trying to estimate instantaneous flux, 
i.e. choosing T as small as possible, and using as many legs as possible to decrease 
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fluctuations, i.e. choosing large values for T . Choosing a value for T < 30 days for 
instance rules out bills that visit a Federal Reserve Bank in between reports in coun- 
ties / and j, as bills that enter FRBs do not return to circulation until approximately 
3^ weeks after entering the FRB. To make sure that our results do not significantly 
change as the parameter T is varied we performed the analysis for various values 
of T ranging from a few days to 7" = 1 year The computed border structure does 
not significantly depend on the value of T . Decreasing T thins out the network and 
reduces the overall connectivity, yet the effects are similar to bootstrapping the net- 
work randomly, a process that also does not change our results and is discussed in 
SectionO 



3.4 Gravity as a null model 

In addition to the empirical data described above, we also construct a synthetic mo- 
bility network based on the well-known gravity law hypothesis HOl |42l to serve 
as a null model. In gravity models the interaction strength between a collection of 
sub-populations with geographic positions x,, sizes A^,- (obtained from census dateQ), 
and distances dij ~ \xi—Xj \ is given by 

^'V-^ (9) 

Uij 

in which a, )3 and pL are non-negative parameters. 

To create a model network comparable to our data, we first compute pij for all 
counties / and j in the continental U.S. and normalize them such that Y.i,jPij — 1- We 
then interpret these values as probabilities for a travel event to happen between the 
two counties (or, speaking in terms of the original data source, a dollar bill report). 
Thus, starting with all-zero link weights Wjj, we repeatedly draw a pair of nodes 
according to pij and increase the corresponding w,y by one, until approximately the 
same connectivity (number of non-zero wij) as in the real-data network is reached. 

We generated gravity networks for different parameter values and gauged them 
against our real data by comparing the distributions of first-order network statistics 
to find the best fit to our data. Distributions have been compared by log-binning the 
values and computing the x'^ statistic 

i i 

where n is the number of bins and A^p {Nf) is the number of values from the gravity 
(real-data) network in bin /. 
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Fig. 7 goodness-of-fit for 
different parameters of the 
gravity law. The minimum is 
at (a,/i) = (0.96,0.3). 
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Fig. 8 Distributions of modularity values for an ensemble of 80 partitions each computed for 
snapshots of the model network at different connectivities. The dashed line corresponds to 0.0765, 
the connectivity of the real-data mobility network. 



Our real data is symmetric and node fluxes are proportional to population sizes, 
therefore we assume a = j3 ~ 1 to narrow down the search volume in parameter 
space. We computed for the distribution of link weights, node fluxes and geo- 
graphical distances and used the sum of them, Xw + + Xd^ the goodness-of- 
it measure. Figure |7] shows this quantity for (oc,/i) G [0.8, 1.2] x [—0.4,0.9], from 
which we concluded that a — P — 0.96 and jj. = 0.3 are the best parameter choices. 
The resulting network and first-order statistics are shown in Figure |9] 

Similar to the bootstrapping procedure described in Section [7T| we tested the ro- 
bustness of the community structure of the model network by generating snapshots 
of the network at different connectivities and computing an ensemble of 80 high- 
modularity partitions for each snapshot. We found that the modularity statistics are 
stable around the target connectivity of 0.0765 (FigurelHll. 
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Fig. 9 Comparison of the real-data network (top left) and the gravity model network with a = j8 = 
0.96 and = 0.3 (top right). The bottom plot shows the distributions of geographical distances d, 
link weights vv, and node fluxes / in the real-data network (blue lines) and the gravity model (green 
lines). 



4 Degeneracy and superposition 

Given a mobility network constructed from the Where's George data, we then apply 
the optimization algorithm described in Section|2]to generate community partitions. 
Since the optimization process is stochastic, the resulting partition varies between 
realizations of the process. Two representative examples of high-modularity parti- 



tions are displayed in Figure 10 Note that, although modularity only takes into ac- 
count the structure of the weight matrix W and is explicitly blind to the geographic 
locations of nodes, the effective large-scale modules are spatially compact in ev- 
ery map. Consequently, although long-distance mobility plays an important role, 
the massive traffic along short distances generates spatial coherence of community 
patches of mean linear extension / = 633 ± 250 km. Note however that although 
each maps exhibits qualitative similarities between detected large-scale subdivisions 
and although each of the maps possess a high modularity score, obvious structural 
differences exist; in fact, even if it were possible to determine a partition with max- 
imal modularity, any such partition is not in principle unique. It is thus questionable 
whether any single effective map can be considered the most plausible partition. 

Theoretical concerns aside, recent work |23| has identified practical issues with 
modularity maximization, in particular the so-called resolution limit. We demon- 
strate that a superposition of community partitions can alleviate these issues with 
the modularity score, and to this end discuss its known shortcomings in more detail. 

In fact, it is straightforward to construct networks of which several distinct par- 
titions with equal and maximum modularity value exist. This degeneracy ofmodu- 
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Fig. 10 High-modularity community partitions of the WG mobility network. The stochastic algo- 
rithm produces different partitions when run many times; these are two representative examples. 
Modularity values are 0.6808 (left) and 0.6807 (right). 



larity was independently found by Good et al. 11261 and marked as a drawback of the 
modularity measure. 

Fortunato and Barthelemy (23) also report on the resolution limit of modularity. 
The authors present two artificial, unweighted networks that exhibit an intuitively 
very clear community structure, yet partitions exist that do not reflect this structure 
but have a higher modularity value than the partition that does. In particular, these 
networks are constructed by connecting multiple fully connected graphs ("cliques") 



with single links (Figure 1 1 1. It is clear that every clique should be grouped into one 
module, but the best partition according to modularity will group multiple cliques 
together. This only occurs if the cliques are small (in terms of number of links) com- 
pared to the full network, thus the modularity measure cannot detect communities 
below a certain resolution limit. 

Our proposed method combines an ensemble of partitions by focusing on the 
boundaries of a partition ("Which adjacent nodes are separated into different mod- 
ules?") rather than its volumes ("Which nodes are grouped together?"), and then 
computing for each boundary the fraction of partitions in which it exists. Because 
we are interested in geographically embedded networks and modules are virtually 
always spatially compact in our case, we can restrict ourselves to boundaries that 
are also real geographical borders between nodes. However, the idea can be easily 
generalized to non-geographical networks, at the expense of convenient straight- 
forward visualization. Since all partitions in the ensemble have a high modularity 
value, this method highlights similarities and differences in degenerated partitions, 
yielding a unique "partition" (or to be more precise, a map) of the network and thus 
overcoming the degeneracy problem. 

In our method any single partition obviously suffers from this limitation as 
well. However, the resolution limit can be alleviated by looking at an ensemble, 
if enough small modules exist to create degeneracies. To illustrate this, we applied 
our method to the two example networks from Fortunato and Barthelemy |23 1. Fig- 
ure 1 1 (left) shows a ring of 34 6-cliques, all connected to their neighbors by a 



single link. The intuitive partition in which each clique is in its own module has 
modularity Qren\ = 0.9081 while a partition that groups pairs of cliques together has 
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Fig. 11 Two networks that expose the resolution limit problem with modularity. The shaded areas 
indicate an artificial geography for nicer visualization of the boundaries in the next figures. (Left) 
A ring of 34 cliques, each of 6 nodes and connected to their neighbors by single links. (Right) A 
network of two 20-node cliques and two five-node cliques. 




OO 



Fig. 12 (Left) The optimal partition in the clique ring groups pairs of cliques together (the same 
color is used for multiple modules). (Center) Example of a partition found by the modularity 
optimization algorithm. (Right) Superposition reveals boundaries in the clique ring between every 
clique. Color codes the fraction of partitions in which the boundary was found. We use T = 2.5 • 



10- 



: 0.75, and / = 0.5 for this example and the next. 



2opt = 0.9099. However, two distinct partitions exist that group pairs of cliques. 
Thus, an ensemble of optimal partitions will be composed out of those two parti- 
tions, yielding a boundary map in which every boundary between two cliques ap- 
pears in 50% of the ensemble partitions. For nicer visualization, we created an arti- 
ficial geography for this network and computed partitions and boundaries, shown in 



Figure 12 Due to the nature of our algorithm, the resulting partitions contain a few 
n-tuples of cliques and single-clique modules that have not been split or merged into 
perfect clique-pairs before the termination criterion, and thus the observed bound- 
aries are stronger than expected. 

The second network proposed in Fortunato and Barthelemy is constructed from 
two 20-cliques and two 5-cliques (Figure [TT| (right)). Here, the two smaller cliques 
are merged into one module by the optimal partition (2opt = 0.5426), although one 
would again expect each of them to be in its own module (2reai — 0.5416). Our 
method is not able to capture the intuitive community structure in this case (Fig- 



ure 13 I, because no degeneracy exists (the partitions in which only one of the small 
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cliques are grouped with the large one, but not the other, are too far from the opti- 
mum to be produced by the algorithm, Qdeg = 0.4959). 




But if we extend the network such that four small cliques exist, the partition 
which groups all cliques into their own modules is still suboptimal to any partition 
that groups together more than one of the small cliques, but degeneracies are created 
and the ensemble of partitions reveals the true community structure in this network 
(Figure 14 1. 

In conclusion, our method is able to dissolve both the degeneracy and resolution 
limit problems if enough small modules exist to create degeneracies. In fact, we will 
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observe small "building blocks" in the WG data that are not seen in single partitions 
but emerge from the superposition of a partition ensemble. 



5 Assessment of border structures 

Using the algorithm described in Section|2j we compute an ensemble of 1,000 par- 
titions of the WG mobility network, all exhibiting a high modularity (Q = 0.6744 ± 



0.0026, see also Figure 15 for the distribution of modularity values) and spatially 
compact modules, and perform a linear superposition of the set of maps. This 
method extracts features that are structural properties of the entire ensemble. The 
most prominent emergent feature is a complex network of spatially continuous ge- 



ographic borders (Figure 16 1. These borders are statistically significant topological 
features of the underlying multi-scale mobility network. An important aspect of this 
method is the ability to not only identify the location of these borders but also to 
quantify the frequency with which individual borders appear in the set of partitions, 
a measure for the strength of a border 



Fig. 15 Ensemble statistics of 
geographic subdivisions for a 
set of W = 1 , 000 partitions. 
The number of modules k 
in each subdivision is nar- 
rowly distributed around 13 
(grey bars), and so are the 
conditional distributions of 
modularity (superimposed 
whisker plots). The ensemble 
mean is ^ = 0.674 ± 0.0026. 
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Investigating this system of effective mobility borders more closely, we see that 
although they correlate significantly with territorial state borders (p < 0.001, see 
Section |7]i they frequently occur in unexpected locations. For example, they ef- 
fectively split some states into independent patches, as with Pennsylvania, where 
the strongest border of the map separates the state into regions centered around 
Pittsburgh and Philadelphia. Other examples are Missouri, which is split into two 
halves, the eastern part dominated by St. Louis (also taking a piece of Illinois) and 
the western by Kansas City, and the southern part of Georgia, which is effectively 
allocated to Florida. Also of note are the Appalachian mountains. Representing a 
real topographical barrier to most means of transportation, this mountain range only 
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Fig. 16 Effective borders emerge from linear superposition of all maps in the ensemble (blue 
lines). Intensity encodes border significance (i.e. the fraction of maps that exhibit the border). 
Black lines indicate state borders. Although 44% of state borders coincide with effective borders 
(left pie chart), approximately 64% of effective borders do not coincide with state borders. These 
borders are statistically significant features of the ensemble of high modularity maps, they partially 
correlate with administrative borders, topographical features, and frequently split states. 



partially coincides with state borders, but the effective mobility border is clearly cor- 
related with it. Finally, note that effective patches are often centered around large 
metropolitan areas that represent hubs in the transportation network, for instance 
Atlanta, Minneapolis and Salt Lake City. We find that 44% of the administrative 
state borders are also effective boundaries, while 64% of all effective boundaries do 
not coincide with state borders. 



5.1 Comparison to gravity models 

We also investigate whether the observed pattern of borders can be accounted for 
by the prominent class of gravity models 12] [TO] |42l, frequently encountered in 
modeling spatial disease dynamics |42|. In these phenomenological models it is as- 
sumed that the interaction strength Wij between a collection of sub-populations is 
given by (|9|, and we construct such a model according to the procedure described 
in Section |3.4| Although their validity is still a matter of debate, gravity models 
are commonly used if no direct data on mobility is available. The key feature of a 
gravity model is that Wij is entirely determined by the spatial distribution of sub- 



populations. We therefore test whether the observed patterns of borders (Figure 16 1 



are indeed determined by the existing multi-scale mobility network or rather indi- 
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rectly by the underlying spatial distribution of the population in combination with 
gravity law coupling. Figure 18 illustrates the borders we find in a network that 
obeys equation (j9]). 

Comparing this model network to the original multi-scale network we see that 
their qualitative properties are similar, with strong short-range connections as well 
as prominent long-range links. However, maximal modularity maps typically con- 
tain only five subdivisions with a mean modularity of only Q = 0.4791. Because 
borders determined for the model system are strongly fluctuating (Figure 17 1, they 
yield much less coherent large-scale patches. Some specific borders, e.g. the Ap- 
palachian rim, are correctly reproduced in the model. The difference between the 
borders of the model system and the empirical data is statistically significant (see 
Section|7]l, and we conclude that the sharp definition of borders in the original multi- 
scale mobility network and the pronounced spatial coherence of the building blocks 
are an intrinsic feature of the real multi-scale mobility network and cannot be gen- 
erated by a gravity model that has a maximum first-order statistical overlap with the 
original mobility network. 





Fig. 17 Sample partitions of the gravity network. Although they share qualitative features with 
those from the original network (Figure [TO) , generic partitions of the gravity model network are 
structurally different, typically exhibiting fewer modules per partition, in different locations and 
with less spatial compactness. 



6 Shortest-path tree clustering 

The methods already discussed successfully extract the structure of geographic bor- 
ders inherent in multi-scale mobility networks. Bootstrapping the network indicates 
that these structures are surprisingly stable in response to perturbations of the net- 
work, but neither the modularity measure nor the stochastic algorithm we use to 
discover partitions provide specific information about the substructures in the net- 
work that make these borders so robust. What feature of the network, more specif- 
ically which subset of links if any, generates the observed borders? In order to ad- 
dress this question and further investigate the structural stability of the observed 
patterns, we developed a new and efficient computational technique based on the 
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Fig. 18 The border structure of the gravity network (red) partially coincides with the borders in 
the original data (blue), but not significantly. The overlap is shown in green, for significance tests 
see Section|7] 



concept of shortest-path trees (SPT). Like stochastic modularity maximization, this 
technique identifies a structure of borders that encompass spatially coherent regions 
(Figure [23]l, but unlike modularity this structure is unique. More importantly, it iden- 
tifies a unique set of connections in the network, a network backbone, that correlates 
strongly with the observed borders. 

This second method for identifying community partitions, based on topological 
features of the analyzed network, has three parts. Given a network with nodes 
containing a single connected component, we first compute a shortest-path tree for 
each node in the network. At least three widely-known algorithms are applicable 
(Dijkstra, Floyd-Warshall, and Bellman-Ford) and various optimizations are pos- 
sible; in addition, if the input is sparse some of these algorithms improve in time 
complexity. In the worst case, however, this can be computed in 0{N^) time. 

Second, we compute a dissimilarity score for each pair of shortest-path trees, and 
using the dissimilarity functions described below, this can also be accomplished in 
0{N^) time. 

Third and last, we apply hierarchical clustering to the table of dissimilarity 
scores, which also takes 0{N^) time for a naive implementation (because we com- 
pute the smallest element of an at-largest-A^-by-A^ table times). Therefore the 
entire suggested procedure takes 0{N^) time. 

As mentioned, various optimizations are possible for computing shortest-path 
trees and hierarchical clustering, and these algorithms are so widely used that high- 
quality, efficient implementations are easily available. In fact, we find that the sec- 
ond step, computing dissimilarity scores, actually dominates the running time al- 
though it is by far the simplest computation; this is due to the fact that we use 



24 



D. Grady, R. Bmne, C. Thiemann, F. Theis, and D. Brockmann 



interfaces to pre-compiled, canned routines for steps 1 and 3, while step 2 is a naive 
MATLAB script. In practice the entire analysis can be run start to finish in under a 
half hour for our network of = 3, 109 nodes on a circa-2008 laptop. 



6.1 Computing shortest-path trees 

The shortest path from vertex / to vertex j is the series of edges that minimizes 
the total effective distance d = ^/^ij along the legs of the path| 14 1. The distance 
along an edge for us is the inverse of the edge weight, as a highly-weighted edge 
indicates that two vertices are effectively proximal. (There are no edges with an 
infinite distance, because we do not define an edge between vertices if there is zero 
weight.) 

The shortest-path tree 7] rooted at node / is the union of all shortest paths origi- 
nating at / and ending at other nodes. We use the MATLAB interfac^to the Boost 
Graph Librarjj^to compute shortest-path trees. To prevent random fluctuations in 
our data from overwhelming the signal, we add a weak link between neighboring 
counties. 



6.2 Measuring tree distance 

A shortest-path tree can be easily represented as a vector of vertex labels T = 
[fjt],fe = I . ..N, such that ti^ is the label of the parent of vertex k, with a special 
symbol (perhaps 0) used to indicate the root. There are no disconnected nodes in the 
mobility network, thus each tree vector represents a single tree and not a forest. This 
representation lends itself to straightforward and meaningful comparisons between 
two trees. 

We define two related measures of the dissimilarity between two trees. The first, 
called parent dissimilarity, asks the question, how many of the vertices in Ta do 
not have the same parent in Ig? We denote this by Zp{Ta,Tb), and it is exactly the 
general Hamming distance of two symbol sequences, that is, the number of places 
where corresponding labels in and Tg do not match. The second, called overlap 
dissimilarity, asks the question, how many edges do the two trees not share? It is 
defined as Zo{Ta,Tb) = Smax — s{Ta,Tb). Here, s„,ax is the largest number of edges 
two trees could share, which is the number of vertices less one (since the root does 
not contribute an edge). s{Ta,Tb) is the number of edges that Ta and Tg do share, 
and where Zp asks essentially the same question considering edges to be directed, 
Zo considers edges to be undirected. Also note that although we consider only the 
topology of trees when measuring their dissimilarity, the topology is determined 
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by the weight of edges in the original graph and thus the mobility dynamics. For 
both measures, possible z values range from (completely identical trees) to A^, the 
number of nodes in the network. 

We compute both measures for each distinct pair of trees in our network and find 
that they are highly correlated (the Pearson correlation coefficient of the two sets is 
0.9980). For this reason, and because of the more straightforward interpretation, we 
focus exclusively on Zp. The parent dissimilarity values in our data range from 2 to 
240. 

To test the stability of this measure we also added various amounts of noise to the 
original weight matrix; for example, adding 1 % noise means that we adjusted each 
entry by a random number such that its perturbed value is within 1 % of its origi- 
nal value. We then compute the set of shortest-path trees for the perturbed weight 
matrix, calculate the tree dissimilarities, and then compute the Pearson correlation 
of the original dissimilarities and the perturbed. The results (0.9995 for 0. 1% noise, 
0.9984 for 1% noise, 0.9937 for 5% noise) indicate the method is robust against 
small perturbations, and in addition we do not observe significant changes in the 
structure of borders determined by the perturbed matrices. 



6.3 Hierarchical clustering and borders 

The measures described above produce a dissimilarity matrix well-suited for use 
with hierarchical clustering [20J. This technique iteratively groups data points to- 
gether into clusters that are less and less similar; it begins by identifying the two 
points with the lowest dissimilarity and grouping them together, then finding the 
next-most-similar data point or group, and so on. When it is necessary to compare 
the dissimilarity of one point (or group of points) with another group of points, a 
linkage function is used. There are several commonly-used linkage functions; we 
compute single linkages (comparing the shortest distance between two groups), av- 
erage linkages (the average distance between two groups), and complete linkages 
(the greatest distance between two groups) and find that the average linkage pro- 
duces the best fit to our data (Table [3]l. 

Table 3 Cophenetic correlation coefficients L37j for various linkage functions using parent dis- 
similarity of trees and inverse weight of links 



Linkage 


Zp{Ti,Tj) 




single 


0.6584 


0.1883 


average 


0.8048 


0.3757 


complete 


0.7197 


0.1400 



The result of the hierarchical clustering algorithm is a linkage structure that can 
be represented graphically with a dendrogram (Figure 19 1. The radial lines in the 
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dendrogram represent vertices in our network or groups of vertices, and the arcs 
represent a link that joins groups together in the hierarchy. The nearer an arc is to 
the center of the circle, the greater the dissimilarity between the groups joined by 
the arc. 

Each arc corresponds to a geographic border between a set of counties, and the 
closer the arc is to the center of the circle, the more significant the border. At the 
outermost level, the dendrogram necessarily puts a border around each individual 
county, and we threshold at 30% of the height of the tree (corresponding to a dis- 
similarity Zp — 41 .6019) for the analysis of the WG network. 

As you can see in Figure 21 the high-level groups identified by this procedure 
are spatially coherent, but may be divided into spatially disjoint regions at certain 
heights in the dendrogram. 

Hierarchical clustering is also sometimes applied directly to the inverse of the 
weights, 1 /wij. We have investigated this method as well and find that it has several 
shortcomings. First, to apply a hierarchical clustering algorithm requires computing 
a dissimilarity for every pair of data points; since many pairs of counties are not di- 
rectly connected by a link in our network (wij is zero), the inverse does not exist and 
it is consequently necessary to add some noise to the weight matrix at the very first 
step, representing pairs of vertices that are 'extremely distant' but not disconnected. 
Second, the linkage structures produced from this approach fit the data poorly (Ta- 
ble [3]l. Last, one can see by visual inspection of the dendrograms in Figure 19 that 
this approach does not yield significant information. Comparing the dendrograms 
for the Zp and 1 /w matrix, we see that in the shortest-path tree approach most of 
the links that appear higher in the tree (closer to the center) are linking together two 
groups that are strongly dissimilar from one another (seen by comparing the height 
of the parent link to the heights of the children links). In the inverse weight method, 
this is not true: links high in the tree are linking groups that are quite similar; that is, 
inverse weight clustering does not identify groups of strongly dissimilar vertices. 

Although the method yields a unique sequence of topological segmentations, the 
observed geographic borders exhibit a strong correlation with those determined by 
modularity maximization (Figure 20 1. 



6.4 Link significance 

The key advantage of this method is that it can systematically extract properties 
of the network that match the observed borders. A way to demonstrate this is to 
measure the frequency a at which individual links appear in the ensemble of all 
SPTs, which is conceptually related to their link betweenness |35 1. Computing this 
link significance a for each connection, we find that the distribution P{o) of the 



network is bimodally peaked (Figure 22 1. This is a promising feature of P{cy) as it 



allows labeling links as either significant or redundant without introducing an arbi- 
trary threshold which is necessary for more continuously distributed link centrality 
measures. Extracting the group of significant links and constructing a subnetwork 
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Fig. 19 Dendrograms from hierarchical clustering. (Left) Using the parent dissimilarity matrix 
and average linkage. Colors correspond to a particular community partition depicted in Figure [21] 
(Right) Using the inverse weight matrix with noise and average linkages. Even inspection by eye 
reveals immediately that clustering of the inverse weight matrix produces a poor fit to the data, 
pointing to the need for some type of 'pre-conditioning,' here provided by SPT dissimilarity. 




Fig. 20 Comparing borders from modularity maximization (blue) with SPT clustering (red) re- 
veals a significant overlap (green). The cumulative topological overlap (see Section^ is 0.5282 
indicating that the SPTD method represents an alternative computational approach to border ex- 
traction. 



from these links only we observe that this subnetwork matches the computed bor- 
der structure. By virtue of the fact that the most frequently shared links between 
SPTs are local, short-range connections we see that the SPT boundaries enclose lo- 
cal neighborhoods and that the boundaries fall along lines where SPTs do not share 
common features. Note that effective metropolitan areas around cities can be de- 
tected with greater precision than modularity, although the western US is detected 
as effectively a single community. 

Finally, we performed statistical analyses that quantify the overlap of the effec- 
tive, mobility-induced borders with those provided by census-related systems. We 
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Fig. 21 The geographic partition determined by cutting the dendrogram of Figure [19| at a height 
of 95 

Fig. 22 The distribution of 
link significance <T, defined 
for each link as the number 
of shortest-path trees the link 
appears in, exhibits a strong 
bimodal distribution. This 
implies that SPTD can sort 
links into important or not, 
and that <J is approximately a 
binary variable. 




choose the set of borders separating the states, the borders defined by the districts of 
the 12 Federal Reserve Banks, and the borders of Economic Areas ||39]| . We discuss 
this analysis in more detail in Section]?] but briefly, we find a significant correlation 
with economic boundaries (p < 0.001, z-score 8.024 for the modularity borders and 
p < 0.001, z-score 13.29 for the SPT borders). 



7 Significance and comparison of border structures 
7.1 Bootstrapping the Where's George data 

In order to test the robustness of our method against random data removal, we per- 
formed the following bootstrapping analysis. Starting with the full dollar bill dataset, 
and the resulting network weight matrix W with elements wij, we randomly remove 
single dollar bill reports until the total flux / = Wij is reduced by a factor y. 
Using this method we constructed several networks for < 7 < 0.95 and computed 
an ensemble of 100 partitions for every value of 7, using the simulated annealing 
algorithm described in Section |2] We find that the modularity value is unaffected 
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Fig. 23 By comparing the border structure from SPT clustering with the ensemble of significant 
links (those that appear in at least half of the shortest-path trees) we identify topological structures 
which reveal the core of the network that explains the majority of border locations. This core is 
represented by the network in blue consisting of star-shaped modules centered around large cities 
(yellow squares). 



by bootstrapping even if 95% of the total flux is removed, although the number of 
modules in each partition rises as the network is thinned out more than 85% (Fig- 
ure 24 1. Also, the boundary structure emerging from superposition of all partitions 



is very robust under this procedure (Figure 25 i. At 20% of the original flux (7=0.8) 
virtually all of the boundaries found in the complete network are still identified, al- 
though the sparsity of the data evokes some singular counties. Even with only 5% 
of the flux, when boundaries become more fuzzy, some of the original structures are 
still detected. 
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Fig. 24 Distributions of modularity values and number of modules for an ensemble of 100 parti- 
tions computed for each value of the bootstrapping parameter 7. The dashed line corresponds to 
78.2%, the amount of flux ignored if all links shorter than 400 km would be removed. 
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Fig. 25 Linear superposition of 100 partitions for four different values of the bootstrapping pa- 
rameter 7, color-coded according to the fraction of partitions they appear in. 



7.2 Measuring overlap of two boundary networks 

In this section, we describe how to compare boundary networks defined on a planar 
graph, in our case the county network of the continental US excluding Alaska. 

A boundary network b is simply given by assigning a nonnegative number w to 
each edge between adjacent counties: If the two counties are not divided by b then 
w = 0. Otherwise w > implies that the border shared between the two counties has 
the strength w. In Section |4] we described how to generate such a boundary network 
by superposition of many partitions of the Where's George money travel network. 
We denote this boundary network by the modularity boundaries bM- 

We want to quantify how much information the modularity boundaries bM shares 
with e.g. a state network, a random network, or a boundary network generated with 
another method. 

For this we essentially need to determine the cross-correlation between two 
boundary networks b and b' . However cross-correlation itself is not well-suited 
for dealing with the non-negativity of the edge weightings, so we calculate a non- 
centered version of it. The absolute cross-correlation of the two boundary networks 
b and b' is then given by the normalized scalar product of their edge weightings, 
i.e. by 
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a{b,b') := 



{l/\E\)^,^Eb{e)b'{e) 



^{l/\E\)^,^,b{e)^^{l/\E\)ZesEb'{er 



where E denotes the set of edges connecting adjacent counties. This quantity lies 
between and 1 and equals 1 if and only if the two boundaries are identical up to 
scaling. 

Apart from the upper bound, this quantity however is difficult to judge. In par- 
ticular, we cannot compare right away two cross-correlations between different net- 
works since a{-,-) might depend on the number of clusters and inhomogeneity of 
weights etc. We avoid finding a direct interpretation of the absolute cross-correlation 
by instead considering deviation of observed values against cross-correlations with 
a null model. 

Such null models are used to tell random occurrences of structures from true in- 
formation. One typically wants to keep some statistics of the network fixed while at 
the same time randomly sampling from its representational class. This results in the 
notion of random graphs with certain additional properties such as Erdos-Renyi flSl 
or Barabasi- Albert |4|. The key idea is to generate a random network preserving 
planarity and possible additional information by using the original structure and it- 
eratively changing it by a random local modification. For instance for unweighted 
networks, a random graph can be generated by 'rewiring': two distinct edges and 
two different vertices contained in either of the two are randomly selected and then 
swapped. Clearly this operation keeps both degree distributions fixed. After a cer- 
tain number of iterations, the thus-generated Markov chain produces independent 
samples of the underlying random graph with given degree distributions [33|. This 
concept has been generalized to weighted graphs B3l ; in this case it is debatable 
whether to swap the whole weighted edge or to split up the weight. 

In our case we search for a randomization of a boundary network i.e. of a planar, 
weighted graph. Rewiring as above is not possible since it would destroy planarity. 
Instead we propose to locally modify the graph at a random county: select a subpath 
of its boundary and flip it to its complement. In the case of non-trivial weights, we 
reassign a random number between and the minimal edge weight on the subpath. 



We have illustrated this procedure on an example in Figure 26 





Fig. 26 Local modification of a planar graph. We select the bottom left county to modify. The 
selected path to modify is shown in bold in the left figure. Its minimal weight is 1 . This is subtracted 
in the right hand figure, where the complementary path is shown. 



32 



D. Grady, R. Bmne, C. Thiemann, F. Theis, and D. Brockmann 



This procedure is now repeated multiple times until sufficiently de-correlated 
samples from the original network are produced. In practice, it is common to choose 
iterations in the range of the number of edges in the network or more. 




-i 

700 iterations 900 Iterations 



Fig. 27 Randomization of the modularity boundary network hM- The original network and the first 
900 iterations are shown. 



7.3 Randomization of the mean partition boundary of the Where's 
George network 

In order to test for significances of calculated similarities, we build a random model 
of the mean partition boundary by generating 1000 random networks using the 
above algorithm with > 15000 successful iterations for each random network. The 



corresponding maps for the first 900 iterations are shown in Figure 27 Clearly the 
original structure in the boundary network is increasingly diluted, and after > 10000 
iterations becomes stably random. 
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5000 10000 
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Fig. 28 Absolute cross-correlation of the randomized network after the given number of iterations 
with the modularity boundary network hM- 




1 random 

-state boundaries 



absolute correiation 

a absolute correlations of the null model and with the state boundaries 




Irandom 

-county boundaries 



absolute correiation 



b absolute correlations of the null model and &m with the county boundaries 



Fig. 29 Absolute cross-correlation of state and county boundaries when compared with a null 
model based on the modularity boundary network bM- 
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This can be seen by calculating the absolute cross-correlation a{bM,bi{) of the 
modularity boundary network bM with the random networks bg, when increasing 
the number of iterations, see Figure 28 We observe convergence to roughly 0.5 
after about 10000 steps. This lies well in the range of random correlation with a 
mean of 0.49 and a standard deviation of 0.028, see histogram in Figure 29 a). This 
implies that the randomization procedure converges to a set of random boundary 
networks, which can now be used to put calculated autocorrelations into perspective 
against this null model. 



7.4 Significances when comparing boundary networks with the 
null model 

We describe and quantify overlap of the estimated modularity boundaries bM with 
other political or social boundaries. As described before, we can quantify overlap 
by determining the absolute cross-correlation a{b,bM)- In order to determine inter- 
pretable numbers, we compare this value to correlations with random boundaries bR 
from a null model. 

We now determine significance of coincidence of the modularity boundary net- 
work bM and the SPT boundary network bg with 

• modularity boundaries bM, 

• state boundaries, 

• county boundaries (to test for sensitivity of the method against number of com- 
munities), 

• boundaries resulting from the SPT algorithm bs, 

• boundaries determined on the gravity model, 

• boundaries determined on long-range distances only, 

• federal reserve district boundaries (FRB), and 

• economic area boundaries ( |http : / /www . bea . govj l. 

The significance is calculated by replacing bM and b^, respectively, by elements 
from the corresponding null model. 

For illustration we show two histograms and actual values for state and county 
boundaries in Figure |29] Clearly the random cross-correlations are quite different, 
which means that we have to interpret the actual values of 0.439 and 0.398 differ- 
ently as well. Indeed it turns out that the state value is far from the mean random 
cross-correlation 0.272 ±0.018, whereas the county one is not (0.419 ± 0.023). 
Indeed, the empirical p-values, determined as the fraction of random correlations 
above the observed true one, is in the former and 0.84 in the latter case. 

In order to compare cases with large deviation from the distribution, we deter- 
mine the z-score i.e. the distance of the absolute cross-correlation from the mean of 
the null model normalized by the standard deviation: 
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_ a{b,bM)-E{a{b,bR)) 
' std{a{b,bR)) 

where E denotes mean and std standard deviation. In the state case, this z-score 
is very high, 9.46, which means that the observed correlation is more than 9 stan- 
dard deviations away from the random mean. In contrast the county z-score is 0.90, 
which means that the observation is within one standard deviation and hence not 
significant. 

We summarize the calculated cross-correlations in Tables [4] and [5]for but and bs- 



Table 4 Comparing boundary overlaps for various boundary networks with the modularity bound- 
aries and the corresponding null model h/^ using absolute cross-correlation a. 



boundary network 


a{-,bM) 


a{-,bR) 


3- value 


z-score 


modularity boundaries 


1.000 


0.495 ±0.028 


< 10-3 


18.15 


SPT communities 


0.552 


0.385 ±0.024 


< 10-3 


7.03 


state boundaries 


0.439 


0.272±0.018 


< 10-3 


9.46 


county boundaries 


0.398 


0.419 ±0.023 


0.84 


0.90 


gravity boundaries 


0.260 


0.253 ±0.019 


0.35 


0.40 


large-range network boundaries 


0.198 


0.181±0.017 


0.14 


1.02 


federal reserve district boundaries 


0.377 


0.227 ±0.019 


< 10-3 


7.91 


economic area boundaries 


0.452 


0.307±0.018 


< 10-3 


8.024 



Table 5 Comparing boundary overlaps for various boundary networks with the SPT-based bound- 
ary bs and the corresponding null model bf; using absolute cross-correlation a. 



boundary network a(-,bf;) p-value z-score 

modularity boundaries 0.552 0.251 ±0.013 < 10-3 22.55 

SPT communities 1.000 0.367 ±0.0164 < 10-3 40.63 

state boundaries 0.358 0.220±0.0138 < 10-3 10.99 

county boundaries 0.569 0.562± 0.016 0.36 0.44 

gravity boundaries 0.305 0.260± 0.016 0.002 2.73 

large-range network boundaries 0.257 0.199 ±0.015 < 10-3 3.94 

federal reserve district boundaries 0.307 0.159±0.013 < 10-3 11.79 

economic area boundaries 0.492 0.318±0.013 < 10-3 13.29 
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7,5 Discussion 

For the state and the SPT boundaries we observe a strong deviation from the null 
model when comparing against the modularity boundaries. So we can conclude that 
both state boundaries and SPT boundaries are more similar to bM than expected by 
chance with a p- value < 10^^. 

This is not the case for the gravity model, the county boundaries and the long- 
range model. In these cases, the cross-correlation with bM is not larger than with a 
random model (p-value w 0.44, sa 0.84 and w 0.14). This means that they do not 
significantly coincide with bM- 

The absolute cross-correlation of the FRB boundaries with bM is a{bf,bM) = 
0.38, which is significantly high when compared with the null model, which exhibits 
cross-correlations of only a{bf,bR) ~ 0.23 ±0.019. We observe a strong deviation 
from the null model and can therefore conclude that the FRB boundaries are more 
similar to bM than expected by chance with a p-value < 10^^. 

The corresponding z-score equals 7.91, which is lower than the one for states 
(9.46). This implies that the modularity boundaries' overlap with the states is larger 
than the one with the FRB boundaries. 

We interpret the results on the FRB boundaries when compared with bM as fol- 
lows: 

• The structure of bM may be (partially) due to political structure i.e. result from 
bg or due to additional money transport within FRB districts i.e. correlate with 
bf. Since both bs and bf share strong similarities, in each of the two situations, 
we would see overlap with both boundaries, so we can only judge strength of 
overlap with respect to the other boundary. 

• We quantified strength of overlap by deviation from the null model, and the cor- 
responding z-score was more than 1.5 standard deviations higher for the state 
model. This stronger overlap of states with bf therefore favors the first hypothe- 
sis i.e. the situation that political boundaries are a stronger factor for the pattern 
observed in bM- In the case of dominance of the second hypothesis, we would in- 
stead expect to still see overlap with state boundaries, but less overlap than with 
the FRB ones. 
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